An Empirical Study of Smoothing Techniques for LanguageModelingStanley
نویسندگان
چکیده
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mer-cer (1980), Katz (1987), and Church and Gale (1991). We investigate for the rst time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) aaect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques , one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which out-perform existing methods.
منابع مشابه
به کارگیری بیز تجربی در تهیه نقشه جغرافیایی بروز بیماری سل در استان مازندران طی سالهای 90-1384
Background and purpose: Due to the increasing information about illnesses and deaths, classified map is of appropriate methods for analyzing this type of data. Standardized infection rates are commonly used in disease mapping but had many defects. This study aimed to compare the Poisson regression models and empirical Bayes models to prepare geographical map of tuberculosis incidence in Mazanda...
متن کاملA Smoothing Technique for the Minimum Norm Solution of Absolute Value Equation
One of the issues that has been considered by the researchers in terms of theory and practice is the problem of finding minimum norm solution. In fact, in general, absolute value equation may have infinitely many solutions. In such cases, the best and most natural choice is the solution with the minimum norm. In this paper, the minimum norm-1 solution of absolute value equation is investigated. ...
متن کاملEvidence on Asset Sales and Income Management: Case of Iran
This study empirically examines whether managers manipulate reported income through the timing of sales of long-lived assets and investments. Several empirical implications of the income-smoothing and debt-equity hypothesis in the context of asset sales were tested. The findings are consistent with the timing of asset sales by managers so that the recognized accounting income from these sales s...
متن کاملPrediction of global sea cucumber capture production based on the exponential smoothing and ARIMA models
Sea cucumber catch has followed “boom-and-bust” patterns over the period of 60 years from 1950-2010, and sea cucumber fisheries have had important ecological, economic and societal roles. However, sea cucumber fisheries have not been explored systematically, especially in terms of catch change trends. Sea cucumbers are relatively sedentary species. An attempt was made to explore whether the tim...
متن کاملLeast Squares Techniques for GPS Receivers Positioning Filter using Pseudo-range and Carrier Phase Measurements
In present study, using Least Squares (LS) method, we determine the position smoothing in GPS single-frequency receiver by means of pseudo-range and carrier phase measurements. The application of pseudo-range or carrier phase measurements in GPS receiver positioning separately can lead to defects. By means of pseudo-range data, we have position with less precision and more distortion. By use of...
متن کامل